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PROTEIN ANALYSIS 

The present invention relates to methods for analysing mixtures of proteins. In 
particular, the invention relates to methods to compare proteins between different 
cells and tissues. The invention involves the fractionation of proteins or peptides 
derived from proteins and subsequent analysis of mass, preferably by mass 
spectrometry. 

Current methods to analyse en masse complex mixtures of proteins such as in 
mammalian cells or tissues require that the proteins are separated by technologies 
such as two dimensional (2D) gel electrophoresis. For this technology, cellular 
proteins are usually separated on the basis of charge in one dimension and on the 
basis of size in the other dimension. Proteins can either be identified with reference 
to the electrophoresis migration pattern of a known protein or by elution of the 
protein from the electrophoretically separated spot and analysis by methods such as 
mass spectrometry and nuclear magnetic resonance. However, limitations of the 2D 
protein gel method include the limited resolution and detection of proteins from a 
cell (typically only 5000 cellular proteins are clearly detected), the limitation to 
identification of separated proteins (for example, mass spectrometry xisually requires 
lOOftnoles or more of protein for identification), the specialist nature of the 
technique and the difficulty in automating the technique in order to achieve very 
high protein analysis throughputs. There is thus a need for superior methods to 
analyse complex mixtures of proteins en masse especially using methods without gel 
electrophoresis and methods which are easy to automate. 



The present invention provides for methods to analyse complex mixtures of proteins 
by fractionation of these mixtures of proteins (or derived peptide fragments) and 
subsequent analysis of mass, especially using mass spectrometry. In particular, the 
5 invention provides for methods to reduce the complexity of protein or peptide 
mixtures in order to limit the presence of overlapping peaks detected by mass 
spectrometry thus enabling the comprehensive analysis of large protein or peptide 
mixtures such as those derived from whole proteomes. 

10 In one major aspect of the present invention, proteins are either digested or cleaved 
into smaller peptide fragments and then subjected to mass analysis especially by 
mass spectroscopy. The principle of the invention is to apply one or more protein or 
peptide fractionation steps to limit the complexity of the protein or peptide mixture 
being subject to measurement of mass analysis typically as mass-to-charge ratio 

15 measured by mass spectroscopy. Optionally, proteins or peptide fragments may also 
be conjugated with a "chemical tag" to assist in fractionation. Such a chemical tag 
may be added either before or after protein digestion/cleavage. One preferred 
method for fractionation of peptides is by using affmity reagents such as antibodies 
or solid phases or reactive chemical groups to isolate specific peptides or mixtures of 

20 peptides for subsequent mass analysis. Affinity reagents such as monoclonal or 
polyclonal antibody preparations can be used to retrieve individual peptides or sets 
of peptides from the peptide mixture for subsequent mass analysis. Alternatively or 
additionally, affinity reagents can be used to eliminate peptides from the mixture 
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whereby the mixture is itself subsequently subjected to mass analysis. The affinity 
reagents can either bind by virtue of specific sequences or structures in peptides or 
by virtue of specific chemical groups either as natural constituents of the peptides or 
as chemical tags which are added to the peptides either before or after cleavage. 

5 

In one embodiment of the invention, analysis of larger mixtures of peptides, panels 
of mixed antibodies such as those provided by recombinant libraries of antibody 
variable region fi-agments (including single-chain antibodies) are used in order to 
isolate subsets of peptides for subsequent analysis. Such panels of monoclonal 

10 antibodies will include a wide range of peptide specificities which are achieved, for 
example, by pre-absorbing antibody libraries on the peptide samples of interest or by 
immunising animals with peptide samples of interest and collecting polyclonal 
antisera or generating panels of monoclonal antibodies. Then individual or mixtures 
of the selected antibodies are used to isolate (or eliminate) the specific subsets of 

15 peptides fi-om a test sample. Subsequent mass analysis of a range of peptides can 
facilitate the detection of differences in specific proteins between test samples. 

In another embodiment of the invention, fractionation of peptides is achieved using 
affinity reagents other than antibodies. Generation of antibodies to all peptides in a 
20 mixture is difficult and is highly dependant on the number of peptides in a mixture 
and the facility for individual peptides to be boimd with reasonable affinity to 
antibodies ("antigenicity"). With a very large peptide mixture, a limitation is 
redundancy whereby antibodies with the same peptide specificities are repeatedly 
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represented whilst antibodies to other peptide specificities are imderrepresented or 
absent. This may cause a particular protein to not be mass analysed if none of the 
peptides from a particular protein are bound by an antibody. As an alternative to 
using antibodies to fractionate, chemical tags are added to proteins or peptides 
permitting subsequent isolation of the tagged peptides. Unique chemistries are 
available for attachment of ligands to several specific amino acids, for example to 
the -amino groups of lysines, the thiol groups of cysteines and the carboxyl groups 
of aspartic and glutamic acids. One advantage of isolating peptides by virtue of 
chemical tags is that a selection is made for larger peptides w^hich are more likely to 
contain a specific amino acid to which a tag is attached thus isolating peptides with a 
mass which exceeds low molecular weight masses with a larger background noise 
during mass analysis. Another advantage is the array of reagents already available 
to introduce chemical tags onto specific amino acids within proteins or peptides 
especially reagents which provide a biotin tag. An additional highly advantageous 
step, especially with large peptide mixtures, is to further fractionated the peptides 
prior to mass analysis either using convention separation technologies such as ion 
exchange or size exclusion chromatography or using other affinity reagents which 
either recognise specific peptide sequences or which recognise other chemical tags 
on the peptides. 

As an alternative to adding a chemical tag to a specific amino acid side group, tags 
can be added specifically at (he terminus of the protein molecule as described 
previously in the art. N or C terminal peptides (or both) from a protein can then be 



separated by preabsorption of the protein to a solid phase via its N and/or C terminus 
prior to cleavage or by chemical tagging of the N and/or C terminus for subsequent 
isolation after cleavage. In principle, this then should lead to recovery of all N 
and/or C terminus peptides representing all proteins from the sample. Such isolation 
5 of N and/or C terminal peptides is greatly facilitated by the differential reactive 
nature of the N terminal amino group and the C terminal carboxyl group in the 
protein compared to internal amino and carboxyl groups. In practice, the method 
described in the prior art is not particularly useful for subsequent direct mass 
analysis of large mixtures of peptides. Tlie present invention incorporates the 

10 additional step of fractionating such N and/or C terminal peptides prior to mass 
analysis using standard peptie fractionation methods such as ion exchange 
chromatography or, as described herein, using other affinity reagents which either 
recognise specific peptide sequences or which recognise other chemical tags on the 
peptides. The invention also allows for sequential conjugation of different chemical 

15 tags to the protein / peptide mixture especially where N or C termini are sequentially 
exposed by specific cleavage of the protein / peptide and whereby the N or C termini 
(or both) are conjugated with a specific chemical tag upon exposure of that termini. 
This aspect of the invention therefore provides for a series of protein fractions with a 
range of conjugated chemical tags introduced at the termini, such fractions being 

20 isolated using an affinity reagent which binds to the tag. 

In another aspect, the present invention provides for cleavage of mixtures of proteins 
using proteases or chemical methods and subsequent mass analysis without further 
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fractionation. In this case, the analysis of protein mixtures is assisted by sequential 
cleavage cycles whereby the spectrum of proteins and peptides are analysed 
following each cleavage cycle. This method could also include chemical tagging 
cycles between cleavage cycles to increase the mass or steps to remove side-groups 

5 such as carbohydrate groups in order to reduce mass. If the mass of the range of 
protein fragments is then determined at the end of each cleavage cycle (either with 
or without chemical tagging, cleavage or other modification), then a range of mass 
distributions will be obtained for each cycle. With an appropriate series of mass 
modification cycles, the result for a single protein or a mixture will be a mass 

10 spectrum of protein/peptide fragments which is altered at successive cycles; the 
pattern of these alterations will provide a "fingerprint" for the specific 
proteins/peptides in the mixture. The appearance and disappearance of a particular 
protein/peptide fragment of a certain mass following a specific cleavage cycles with 
or without chemical tagging, cleavage or other modifications will provide a 

15 fingerprint for identification of the Augment sequence especially by reference to a 
database of such fingerprints. Comparison of the spectrum of protein/peptide 
fragments from different related samples then allows for the identification of 
protein/peptide fragment differences between these samples. Particularly useful in 
this embodiment of the present invention is proteases which specifically recognise 

20 two amino acids and cleave the protein as a result. An example of such proteases 
are the prohormone convertases which cleave between dibasic amino acid pairs. 

In a related aspect of the present invention, proteins are fractionated prior to 

6 



cleavage. For large protein mixtures, particularly those isolated directly from whole 
cells or tissues, the pre-fractionation of proteins may be desirable in order to reduce 
the complexity of mixtures subjected to subsequent cleavage, peptide fractionation 
and mass analysis. Whilst affinity reagents can be used which recognise sequences 
5 or structures in the proteins/peptides directly, this will itself require a complex 
library of affinity reagents such as an antibody library and therefore the additional 
use of chemical tags to provide moieties recognised by a set of affinity reagents 
provides an alternative means of using such reagents. More conventional means of 
pre-fractionation include the use of gel electrophoresis either in one or two 

10 dimensions where sections of the gel are isolated and the proteins within then 
subjected to cleavage and mass analysis. Other pre-fractionation methods include 
isolation of proteins by virtue of natural modifications such as phosphorylation, 
glycosylation, protein-protein (or peptide) interaction; altematively, membrane 
proteins can be pre-fractionated or proteins from particular compartments within the 

15 cell. Another important pre-fractionation procedure is to remove highly abimdant 
proteins from the mixture using affinity reagents such as antibodies to bind and 
remove such proteins. As an alternative to pre-fractionation, peptides generated 
after cleavage can also be fractionated by many of these means and also including 
size/charge fractionation methods particularly using HPLC. Such methods are 
20 particularly useful to fractionate peptides which have already been selected from a 
mixture through the application of affinity reagents. In particular, HPLC can be 
interfaced with mass analysis such that peptide fractions from HPLC separation are 
directly subjected to mass analysis. Peptides generated after cleavage can also be 
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fractionated by virtue of natural modifications using, for example, antibodies which 
bind phosphorylated amino acids within peptides. Prefractionation of proteins may 
also be achieved by using affinity reagents such as monoclonal/polyclonal antibodies 
to isolate specific proteins for subsequent cleavage and mass analysis. For such 

5 analysis of larger mixtures of proteins, panels of mixed monoclonal antibodies such 
as those provided by recombinant libraries of antibody variable region fi-agments 
(including single-chain antibodies) are preferred in order to isolate subsets of 
proteins or subsets of cleaved peptides for subsequent analysis. Such panels of 
monoclonal antibodies will include a wide range of protein or peptide specificities 

10 which could be achieved, for example, by pre-absorbing antibody libraries on the 
mixed protein/peptide sample of interest and then using individual or mixtures of the 
selected antibodies in order to isolate subsets of proteins or peptides. Such analysis 
provides mass spectra for a range of different protein/peptide fi-actions thus 
facilitating detection of differences in specific proteins between samples. 

15 

A further advantage of the use of chemical tags is that the subsequent fractionation 
of peptides by affinity reagents can greatly reduce the number of selected peptides 
from a protein molecule with the rest of the molecule thus being eliminated from the 
mass analysis. An especially convenient method for selective chemical tagging is to 
20 tag either (or both of) the N and C terminus of the protein molecules in the mixture 
and then to digest or cleave the protein molecules with a reasonably selective 
reagent such as a amino acid or sequence-specific protease (such as endopeptidase 
Arg-C) or cleavage reagent (such as acid pH to cleave at Asp-Pro). Using an affinity 
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reagent, N or C terminal peptides (or both) from the original protein could then be 
isolated and all internal peptides discarded. This reduction in complexity is then 
sufficient for mass analysis especially using HPLC coupled to a tandem mass 
spectrometer to analyse the peptides en masse in order to identify the individual 
peptides from the mixture. 

Alternatively, chemical tagging could be performed only after digestion/cleavage, 
for example with the dibasic cutters, the prohormone convertases. This would 
provide for tagging only at one or more internal sites of the original proteins. If the 
protein mixture is then subjected to a second digestion/cleavage step with a different 
enzyme or cleaving reagent, then the size of the tagged peptides would be reduced 
where a cleavage site was present in the original protein. The tagged peptides could 
then be fractionated using an affinity reagent and subjected to mass analysis. 

In another aspect of the current invention, a protein mixture is subjected to cycles of 
tagging, digestion/cleavage and mass analysis, whereby mass analysis is performed 
only on an aliquot of the mixture resultant from use of an affinity reagent binding to 
the specific chemical tag and whereby the master mixture is then subjected to 
tagging with a different chemical tag and digestion/cleavage. This provides 
sequentially a range of different fragments for mass analysis. Another variation on 
the method involves the same initial steps as above but, having exposed new N and 
C termini after cleavage, one (or both) of these new termini can then optionally be 
tagged with a different chemical which thus tags internal sites in the original protein. 



If required, the process could be repeated one or more times with a different protease 
or cleavage reagent, each time with the addition to the N or C terminus of a different 
chemical tag. In one format of the method, the whole mixture of proteins would first 
be tagged with two different chemical groups at each of the N and C terminus and 
5 then cleaved with a protease, such as one which specifically cuts adjacent to a 
specific amino acid, and tagged again at the new N and C termini with two further 
different chemical groups. This would result in a mixture of peptides each with 
chemical tags at the termini. As the N and C terminal peptides would have a 
specific tag, these could then be isolated fi-om the mixture using appropriate affinity 

10 reagents. Internal peptides without either the initial N or C terminal tags could be 
isolated using their specific tags. The process of digestion and tagging could then be 
repeated to create fiirther peptides with tags. Using specific combinations of affinity 
reagents for specific tags, N or C terminal or specific internal peptides from the 
original protein could then be isolated and selected peptides discarded to achieve a 

15 reduction in complexity sufficient for mass analysis. Where chemical tags are added 
to two or more amino acid side groups within peptides, sequential use of affinity' 
tags could isolate fractions of peptides containing specific combinations of amino 
acids. For example, if a mixture of peptides of average length of 20 amino acids and 
separately tagged at lysine and phenylalanine and the mixture comprises 25% of 

20 peptides which include neither lysine or phenylalanine, 25% with lysine only, 25% 
with phenylalanine and 25% with both, then the separate or sequential use of 
specific affinity reagents either for lysine or phenylalanine will result in 
fractionation of peptides into four equal fractions. In practice, such a fractionation 
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scheme will favour the bindmg of larger peptides to affinity reagents as these 
peptides are more likely to contain one or more of the specific amino acids tagged. 
This will bias against the very small peptides such as those with molecular weights 
less than 1000 daltons which, when subjected to mass spectrometry analysis, will be 
more likely to coincide with background noise due to fragmented peptides and other 
small molecules. 

Where analysis of complex protein mixtures is required such as in mammalian cells 
or tissues, the present invention provides a main method where proteins are 
fractionated either before or after cleavage and the peptides are then mass analysed. 
The fractionation of a complex mixture of proteins or peptides either requires a 
correspondingly complex mixture of affinity reagents or one or more affinity 
reagents which can recognise features of the proteins/pep tides which are the basis 
for fractionation. Where cleavage is conducted prior to fractionation, the most 
common method used in the present invention is to cleave the whole protein mixture 
with a protease such as trypsin or V8 (Glu-C) protease and to then selectively isolate 
and mass analyse certain peptides. Commonly, N or C terminal peptides (or both) 
from the peptide mixture are isolated typically by adding a chemical tag to the N 
iand/or C terminus of the proteins prior to cleavage and using an affinity reagent 
which isolates peptides with the chemical tag. Alternatively, specific peptides (N / 
C terminal or otherwise) can be isolated using affinity reagents which have been 
selected for binding to specific peptides within specific proteins; these will then 
select out those peptides from the mixture for subsequent mass analysis. For more 
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complex mixtures of proteins, a further fractionation step such as HPLC 
fractionation based on size, charge or hydrophobicity is preferred prior to mass 
analysis especially as this can be interfaced with mass analysis. Selective isolation 
of peptides then allows for comparative analysis of specific peptides derived from 
5 alternative protein mixtures for their relative quantities (relating to relative levels of 
the proteins in their respective mixtures) and, in certain cases, for modifications of 
the peptides. 

For isolation of N or C temiinal peptides, the preparation and use of affmity reagents 
10 is one important aspect of the present invention and the labelling of the N or C 
terminus of proteins is another important aspect. With a typical mixture of proteins 
jfrom mammalian cells or tissues or from many living organisms, several of the N 
termini of these proteins (and some C termini) will be modified (for example, by 
methylation) such that addition of a chemical tag to the terminus may be blocked. In 
15 addition, a typical mixture of proteins from mammalian cells or tissues or from 
many living organisms, the proteins will occur at different relative levels of 
abundance including, commonly, certainly highly abundant proteins. Where protein 
mixtures from mammalian cells or tissues or from other living organisms are used 
for the initial selection of affinity reagents, such highly abimdant proteins may 
20 dominate selection of affinity reagents and may be predominant in the final peptide 
mixture for mass analysis. A solution to both of these problems is to use an artificial 
source of mixed proteins to isolate the affinity reagents. Typically, this will be a 
gene expression system whereby a gene (usually cDNA) library is used to generate 
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the proteins without N or C terminal modifications. In addition, the use of a gene 
expression system allows the gene library to be "normalised" to reduce or remove 
highly abundant genes within the library. This is typically achieved by self- 
annealing of the DNA (or RNA) prior to constructing the library. Therefore, a 
5 common method in the present invention is to generate proteins by expression of 
gene hbraries (usually normalised) resulting in proteins free from significant N or C 
terminal modifications and, where normalised, resulting in a protein mixture free 
from domination by specific proteins. A typical expression system used with gene 
libraries is in vitro transcription and translation using a eukaryotic ribosome 

10 preparation; this also provides the possibility of incorporating modified amino acids 
into the expressed proteins. The expressed protein mixture can then be used directly 
for N or C terminal labelling. Other expression systems could also be used where N 
terminal amino groups or C terminal carboxyl groups are not modified or prevented 
from subsequent chemical tagging. Where modification occurs, in some cases the N 

15 terminal modification can be removed either using enzymes such as histone 
deacetylase or chemical methods such as limited cyanogen bromide cleavage to 
remove N terminal methionines. Having produced a mixture of proteins free from 
N/C terminal modification, chemical tags can then be added to the N/C terminal 
amino group(s). For the N terminus, the e-amino group of lysines can be initially 

20 blocked using reagents such as citraconic anhydride or methyl acetimidate to then 
allow only the N terminal amino groups to react. Alternatively, the e-amino group 
of lysines can be blocked by incorporating modified lysines into the expression 
system such as in vitro transciption / translation whereby, for example, biotin- 
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modified lysines can be directly incorporated instead of lysines. Chemical tags can 
then be added selectively to the N terminus of proteins, for example using 
isothiocyanates of specific molecules to which an afiMty reagent is available. One 
such example is fluorescein which is incorporated by reaction of the proteins with 

5 fluorescein isothiocyanate allowing subsequent purification with anti-fluorescein 
antibodies. Alternatively, polycarboxylic chelating agents can be incorporated as 
isothiocy antes allovving subsequent purification with specific metals. Once the N 
and/or C termini of proteins in the mixture are tagged, the protein is then 
comprehensively and specifically cleaved either chemically or enzymatically, using 

10 proteases such as trypsin or another cleaving agent. Such cleavage thereby releases 
fi-om each protein an individual tagged terminal peptide fi-agment, such collection of 
fi:Bgments which can then be purified from the mixmre of untagged peptides using 
an appropriate affinity reagent such as an antibody specific for the chemical tag. If 
required, the size of the chemical tag can be increased in order to produce a larger 

15 mass for analysis; this would be usefiil for peptide fragments resulting from cleavage 
very close to the chemical tag whereby the resultant fragment might be so small as 
to be mass analysed within lower molecular weight "noise". The chemical tag 
might, for example, comprise a piece of nucleic acid attached to the peptide via a 
reactive group introduced during synthesis of the nucleic acid. Such a nucleic acid 

20 molecule might also be useful for isolation of the tagged peptide via annealing of the 
nucleic acid to a complimentary sequence. 

Following chemical tagging and isolation, the recovered mixture of N/C terminal 
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peptides are then iised as a "bait" for the isolation of affinity reagents to bind to these 
same peptides from proteins derived directly from mammalian cells or tissues or 
from other living organisms. Such affinity reagents will typically derive from a 
library of single chain antibodies displayed as part of a particle containing the 
5 corresponding gene encoding the antibody. Examples of such particles are ribosome 
display particles or phage display particles, in each case where the genes from 
selected antibodies can be rescued in order to propagate those specific antibodies. 
As an alternative, large arrays of antibodies (such as recombinant single chain or 
Fabs, Fvs) can be screened using the N/C terminal peptide mixture and antibodies 

10 which display binding to the peptides can be recovered via the corresponding genes. 
As another alternative, N and/or C terminal peptides could be used to directly 
generate polyclonal or monoclonal antibodies by appropriate immimisation of an 
animal. By these means, a mixture of affinity reagents is selected which can then be 
used for the analysis of mixtures of proteins such as from mammalian cells or tissues 

15 or from other living organisms. Such analysis can either involve using the mixture 
of affinity reagents to select out N/C terminal peptides from proteins derived from 
mammalian cells or tissues or from other living organisms or using individual 
affinity reagents to select out individual peptides. The selected peptides can then be 
mass analysed typically by MALDI-ToF (matrix-assisted laser desorption/ionisation 

20 time-of-flight) where the individual peptides give individual charge:mass ratios 
which can then be used to identify the peptide amino acid constituents. MS-MS 
(double mass spectroscopy) peptide sequencing can subsequently be used to identify- 
the peptide if it can be isolated. Alternatively, the new generation of Quadrupole- 
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ToF LC-MS-MS ("Q-ToF") instruments can provide for sequential MALDI-ToF and 
MS-MS wqthin the same instrument. Indeed, affinity reagents either individually or 
in mixtures can be immobilised either indirectly or directly onto the desorption chip 
inserted into the MALDI-ToF instrument and peptides can be subsequently bound 

5 via the affmity reagents on the chip. . In this way, multiple peptide fractions adsorbed 
by multiple affinity reagents at different loci can be analysed on a single chip. The 
use of recombinant proteins as the "bait" to isolate affmity reagents also provides the 
prospect of attaching other tags to those proteins whereby the tags are encoded by 
the gene sequence; for example, a C terminal polyhistidine tag (allowing subsequent 

10 purification of the tagged fragments using nickel chelates) could be incorporated, for 
example through PCR-mediated incorporation into the gene sequences. 



The use of recombinant proteins as the "bait" to isolate affinity reagents also 
provides another common method of the present invention for specifically isolating 

15 peptides using tags encoded by the recombinant proteins. Such tags can be 
conveniently incorporated into members of the a gene (usually cDNA) library during 
its construction or into individual clones or groups of clones thereof using specific 
PGR primers encoding such tags and designed to incorporate such tags into the 
resultant expressed proteins. Preferably, such tags will be incorporated into the 

20 expressed proteins in all reading firames in order to produce a productively tagged 
protein. Such tags will preferably be incorporated via the downstream primer of a 
PGR reaction with the usual result that the tag is produced towards the C terminal 
end of the expressed protein (although upstream termination codons may prevent 
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this in some clones). However, tags may also be incorporated at the N terminal end 
or in both N and C termini. 

For the isolation of specific peptides from a peptide mixture, the peptide sequences 
5 can be produced synthetically (or via recombinant DNA) and then, as above, used as 
the "bait" to capture specific affinity reagents. These affinity reagents can then be 
used to isolate these same peptides firom a cleaved protein mixture derived from, for 
example, mammalian cells or tissues or from other living organisms. 

10 As an altemative to selectively fractionating N or C terminal peptides or specific 
internal peptides, modified peptides can be fractionated such as peptides including 
phosphorylated amino acids which can be isolated using antibodies which 
selectively bind to phosphorylated amino acids (tyrosine, threonine or serine or 
combinations thereof) or using immobilised Fe3+ to trap negatively charged 

15 peptides. Similarly, peptides modified by glycosylation and other modifications can 
be isolated, in some cases where the peptide modification is further derivatised in 
order to facilitate isolation. For example, carbohydrates can readily be modified via 
periodate reactions as an intermediate to adding chemical tags such as fluorescein. 
A particularly important aspect of the invention is the fractionation of selectively 

20 modified peptides whereby such peptides are selectively tagged by virtue of their 
differential exposure to tagging within the original protein environment prior to 
cleavage. For example, surface exposed proteins on living cells can be selectively 
tagged, for example with biotin, by treating the cells with a tagging agent which 
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preferentially reacts with specific amino acid groups. An indirect method for 
achieving such tagging in proteins which are naturally tagged via other stimuli 
within cells is to apply such stimuli in order to effect tagging of the proteins. For 
example, receptor-associated tyrosine kinase molecules within cells can potentially 
5 be tagged (for example, phosphoiylated) by addition of the receptor ligand to those 
cells. Following modification, peptides are released from proteins by cleavage and 
then directly mass analysed or subjected to other fractionation as above prior to 
analysis. 

10 In another aspect of the present invention, and as an alternative to selectively 
fractionating peptides, whole proteins can be tagged and then fractionated without 
digestion or cleavage for subsequent mass analysis. Selective tagging agents can be 
used to tag proteins by virtue of specific natural protein modifications or states 
including phosphorylated proteins which can be isolated using affinity reagents 

15 which selectively bind to phosphorylated amino acids (tyrosine, threonine or serine 
or combinations thereof). Selective tagging can also be achieved by tagging certain 
amino acid side-groups exposed on the surface of certain proteins such as free thiols 
from cysteine residues and free amino groups from lysine residues thence using 
affinity reagents to selectively isolate the tagged proteins. Proteins can also be 

20 selectively fractionated by virtue of glycosylation and other modifications^ in some 
cases where the protein modification is tagged in order to facilitate isolation. A 
particularly important aspect of the invention is the fractionation of selectively 
tagged proteins whereby such proteins are selectively tagged by virtue of their 
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differential exposure to tagging within the original protein environment prior to 
fractionation. For example, surface exposed proteins on living cells can be 
selectively tagged, for example with biotin, by treating the cells with a tagging agent 
which preferentially reacts with specific amino acid groups. Also, proteins can be 
5 naturally tagged via other stimuli within cells is to apply such stimuli in order to 
effect tagging of the proteins. For example, receptor-associated tyrosine kinase 
molecules within cells can potentially be tagged (for example, phosphorylated) by 
addition of the receptor ligand to those cells. Following modification, proteins are 
fractionated and then mass analysed or subjected to other fractionation as above 
10 prior to analysis. 



Mass analysis of proteins and peptides by the present invention is preferably 
performed using mass spectroscopy. In particular, MALDI-ToF analysis has the 
capability to very accurately measure specific mass: charge ratios for individual 

15 peptides. This method has the capability for simultaneous analysis if thousands of 
peptides. Above 4kD, the resolution of individual peptides (and proteins) becomes 
poorer such that cleavage of proteins into peptide fi-agments is necessary in order to 
provide fine resolution. Recent methods of interfacing liquid chromatography 
separation methods (such as HPLC) with tandem mass spectroscopy has already 

20 permitted the mass spectrum analysis of protein mixtures comprising up to 200 
proteins. As such proteins are analysed following protease digestion, if an average 
ten peptides per protein is assumed, then the method can analyse up to 2000 
peptides. Using methods of the present invention whereby, for example, only tagged 
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N terminal peptides are analysed, then up to 2000 N terminal peptides derived from 
up to 2000 proteins could be analysed at any one time. As this is not sensitive 
enough for an en masse analysis of mammalian proteins from cells (typically 50,000 
per cell), then peptides have to be segregated into at least 25 fractions in order for 
these fractions all to be analysed. Such further fractionation can be achieved by the 
direct use of affinity reagents to label intemal ends after successive protein 
digestion/cleavage steps following which specific affinity reagents are used to 
fractionate peptides according to their tags. As an alternative to standard mass 
spectroscopy, MALDI-ToF can be used to produce protein mass profiles which can 
be compared for protein mixtures from different cells. 

Chemical tags are typically moieties which can be covalently attached to proteins 
usually at the N or C terminus. For chemical tagging of the N terminus, this is 
commonly undertaken at the terminal amine group. If it is necessary to avoid 
tagging of the e-amino group of lysines, then these can be initially blocked using 
reagents such as citraconic anhydride or methyl ace timi date. Terminal amine groups 
are then reactive with a wide range of chemical reagents especially using 
isothiocyanates. Thereby, common antibody-recognised ligands such as 
dinitrophenol and fluorescein can then attach these to the N terminus for subsequent 
fractionation using an antibody affmity reagent. For example, the commonly used 
Edman reagent phenyl isothiocyanate can be used to specifically attach to the N 
terminus of proteins and can be derivatised if necessary with a moiety provided for 
subsequent binding to an affmity reagent. For chemical tagging of the C terminus, 
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methods based on carbodiimide activation are commonly used to introduce ligands 
which are bound by affinity reagents. Alternatively, addition of moieties to the C 
terminus of proteins has been described using reverse proteolysis whereby certain 
proteases such as carboxypeptidase Y and lysyl endopeptidase can work in reverse 
to add chemical tags, commonly by way of amino acids either as derivatised amino 
acids with tags for binding to an affinity reagent or by way of natural sequences of 
amino acids which can then be specifically bound by an affinity reagent. It will be 
recognised that a wide range of internal amino acids can also be chemically tagged 
including Lys via the e-amino group, Glu / Asp via the carboxyl group, Cys via the 
thiol group, Ser / Thr via the hydroxyl group and Tyr via the hydroxyphenyl group. 
Specific derivatisations of most other amino acids have been described. It will also 
be recognised that post-translation protein modifications can be used for addition of 
chemical tags especially with glycosylation where the sugar residues are commonly 
oxidised by periodate to formaldehyde groups which can then react with amine- 
containing molecules. Other modifications which can be used to add chemical tags 
include lipidation, phosphorylation and metal ion addition. It will be recognised that 
there are a large number of methods in the art for introducing one or more chemical 
tags at specific sites within protein molecules or peptides. 

Affinity reagents for use in the present invention are commonly monoclonal 
antibodies. For specific sequences or structures within proteins or peptides, a library 
of recombinant antibody binding sites usually in the form of Fab's, Fvs or single- 
chain Fv's is used where commonly the antibody binding sites are "displayed" using, 
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for example, bacteriophage or ribosome complexes such that the gene encoding 
individual antibody binding sites can be recovered. For use in the present invention, 
libraries of antibody binding sites can be dispersed into groups, for example by 
picking and arraying phage plaques or picking and arraying genes in vectors for 

5 ribosome display. Such pools will usually contain antibody binding sites for several 
proteins or peptides such that the pools can be used for fractionation. Alternatively, 
the protein or peptide mixture to which libraries of antibody affinity reagents are 
required can be immobilised and used as the target for the pre-selection of suitable 
affinity reagents which are then dispersed into pools or used as individual reagents. 

10 For chemical tags, individual monoclonal antibodies are used to specifically bind to 
individual tags in order to achieve subsequent fractionation. 



The present invention includes the use of affinity reagents other than monoclonal 
antibodies where such reagents can facilitate the fractionation of peptides or proteins 

15 prior to mass analysis. Such affinity reagents would include molecules of the 
immune which selectively bind certain peptides such as major histocompatability 
proteins and T cell receptors. Other affinity reagents would uiclude protein domains 
commonly involved in protein-protein binding interactions such as SHI domains and 
nucleic acids such as RNA molecules (apatamers) which can selectively bind to 

20 proteins or DNA molecules such as transcription control sequences which bind to 
transcription factor proteins. Included in the present invention is the concept of 
cyclising peptides including within mixtures and especially when bound to solid 
phases by, for example, linking cysteine residues under reducing conditions. One 
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method for this would be to add an additional cysteine residue at an exposed N or C 
terminal on immobilised peptides using, for example for C terminal immobilised 
peptides, standard conditions of peptide synthesis or using reverse proteolysis 
whereby certain proteases such as carboxypeptidase Y and lysyl endopeptidase. 
5 Included in the invention is also an elegant method for further fractionating proteins 
or peptides by adding, usually at the N terminus, amino acids which form part of the 
recognition sequence of a protease which specifically cleaves at a recognition 
sequence of two or amino acids whereby one or more terminal amino acids in the 
protease recognition site is provided by the starting protein or peptide. In this 

10 manner, only a fraction of the proteins or peptides to which the new amino acids are 
added will be then subject to terminal protease cleavage by virtue of the newly 
created sequence. In this manner, proteins or peptides can be tagged with additional 
amino acids usually at the N terminus creating, in a fraction of the thus tagged 
mixture, a specific protease cleavage site. The proteins or peptides can then, for 

15 example, be immobilised via the new terminus for example using a tagged terminal 
amino acid or by adding a chemical tag to the terminus, whereby an affinity reagent 
is then used to immobilise the tagged moieties. After removing non-immobilised 
untagged molecules, the proteins or peptides can then be subjected to cleavage with 
the specific protease which will then only cleave where the cleavage site has been 

20 generated by a combination of synthesis-derived amino acids and the original 
protein or pepti de-derived amino acids. The cleaved peptides can then be mass 
analysed (or further processed prior to mass analysis) thus representing a subset of 
the peptide mixture. By using parallel synthesis of specific amino acids to exposed 
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termini followed by immobilisation and cleavage, large mixtures of proteins or 
peptides can be fi^ctionated on the basis of their terminal amino acid(s)- An 
example of a protease recognition site is ile, glu, gly, arg which is cleaved between 
gly and arg by Factor Xa. The sequence ile, glu, gly could be synthesised onto the N 

5 terminus of a protein or peptide and thus if the adjacent amino acid in the protein or 
peptide sequence were arg, the cleavage site would be created and could be cleaved 
by Factor Xa Other examples of protease cleavage sites are asp, asp, asp, asp, lys, 
cleaved by Enterokinase between asp and lys; pro, gly, ala, ala, his, t>'r cleaved 
between his and tyr by genease I; leu, val, pro, arg, gly, ser cleaved between arg and 

10 gly by thrombin. N terminal addition of partial sequence asp, asp, asp, asp could be 
used to identify proteins or peptides with N terminal lys (cleaved by enterokinase), 
pro, gly, ala, ala, his to identify proteins/peptides with N terminal tyr (cleaved by 
genease), leu, val, pro, arg to identify N terminal gly, ser; or leu, val, pro, arg, gly to 
identify N terminal ser (cleaved by thrombin). Other proteases such as the MMP's 

15 (matrix metalloproteinases) with specific recognition sites could be used to 
fractionate proteins with other N terminal amino acids. Different protease 
recognition sites could thus be used in combination with the proteases to fractionate 
proteins or peptides according to the N terminal amino acid. Where proteins are 
used as the starting material especially from mammalian cells whereby the N 

20 terminal protein is methionine, this can be removed if required by, for example, 
formylation and cleavage by a bacterial protease specific for removal of terminal 
formylmethionine. 
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Affinity reagents are an important aspect of the present invention and can be used 
for both broad fractionation of groups of proteins/peptides or for specific 
fractionation of individual proteins/peptides. For fractionation, it is first necessary 
to prepare fractions of or individual affinity reagents which binds to a specific 
fraction or specific peptide and not to other fractions/peptides. A convenient method 
is to fractionate the proteins or peptides prior to isolation of the affinity reagents. In 
the case of antibodies as the affinity reagents, such proteins/peptides can then be 
used either to bind displayed antibodies from a library or can be used to immunise 
animals for generation of antisera. Where a library of recombinant antibody binding 
sites such as single-chain Fv's is used, gene clones encoding these can be retrieved 
after binding to protein/peptide fractions providing a repli cable source of the affinity 
reagents for subsequent isolation of the specific protein/peptide fraction. Individual 
single-chain Fv's may, in parallel, be screened for binding specificity, for example 
by analysing peptide binding by MALDI-ToF. In this case, single-chain Fv's which 
bind to a single peptide from a large protein mixture are retained (in practice, those 
binding up to three peptides are also retained) as gene clones for subsequent 
individual use or use within a mixture of Fv's for isolation of a protein/peptide 
fraction from the mixture. It will be appreciated that free N termini from proteins 
are often good targets for isolation of very specific antibodies and therefore capture 
and release of N terminal peptides from a protein will particularly favour subsequent 
antibody isolation. Certain Fv's may be useful for the elimination of abundant 
proteins or peptides from the mixture. It will be appreciated that retention and 
characterisation of the binding of single-chain Fv's may also provide a means to 
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reduce redundancy by eliminating Fv's with the same specificity as other Fv's. 

The various aspects of the invention cover combinations of protein or peptide 
fractionation and subsequent mass analysis with a preferable step of fractionation 
5 using affinity tags for specific sequences or structures in the proteins or peptides, 
and an optional step of chemical tagging with fractionation by virtue of these tags. 

The current invention should be considered to encompass these and related 
protein/peptide processing steps v^th the core objective of reducing the complexity 
10 of protein mixtures in order to achieve mass analysis of the resultant protein/peptide 
fractions. 

A major method for operation of the invention involves tagging a mixture of 
proteins (either natural or encoded by cDNA libraries), fractionating the protein or 

15 derived peptide fragments by immobilising the proteins or peptide fragments using 
solid-phase affinity reagents specific for the tags, and releasing and subjecting the 
proteins or peptides to mass analysis. Alternatively, the N or C termini of the 
proteins or peptides may be modified by addition of amino acids prior to cleavage 
with a sequence-specific protease or fractionation with an affinity agent specific for 

20 the modified termini. Prior to mass analysis, proteins or peptides may alternatively 
be used to bind antibodies whereby these antibodies have been pre-selected to 
fractionate the peptides or are themselves retained as affinity reagents. The mixture 
of proteins may be pre-fractionated, for example by size, or may be produced from 
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cDNA libraries which are pre-fractionated by segregation of clones. The retained 
affinity reagents are then used to analyse complex samples of proteins whereby the 
antibodies are used to bind peptides which are then mass analysed. 

It will be appreciated that many of the same principles described herein for the mass 
analysing natural proteins or peptides derived from natural protein populations may 
also be used to analyse recombinant protein populations. One particularly favoured 
application in for the isolation of recombinant antibodies such as single-chain Fv's to 
specific target antigens especially where the antibodies are derived from human 
genes whereby the selected antibody may be suitable for human therapeutic or 
diagnostic use. In this particular application, an extensive gene library of single- 
chain Fv's is created from a pool of immunoglobulin cDNA's such as those derived 
from peripheral blood B cells in humans. If this gene library is created in such 
manner that a random (or semi -random) gene sequence is included within the single- 
chain Fv coding region, then such a random/semi-random gene sequence will 
generate a random/semi -random peptide sequence in individual single-chain Fv's. 
Such a random/semi -random gene sequence can be created using standard methods 
such as PGR whereby a random/semi -random synthetic oligonucleotide sequence 
would be used as one of a pair of primers used to amplify immunoglobulin gene 
fragments during the creation of the single-chain Fv gene library. If the library was 
created appropriately, the resultant single-chain Fv's would each include a "peptide 
tag" unique to that particular Fv. Preferably, the peptide tag would be C terminal to 
the single-chain Fv region and include, flanked between itself and the single-chain 
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Fv region, one or more protease sensitive sites such as sites for Arg-C or Glu-C 
endopeptidase. If a mixture of such single-chain Fv's was produced from a suitable 
gene library, then this mixture could then be mixed with a target antigen (or antigens 
such as on cells), usually where the antigen is immobilised. This would result in 

5 specific single-chain Fv's binding to the target antigen with non-binders (or weak 
binders depending on the stringency of washing) being washed away. Having 
washed away excess antibodies, the remaining antigen/single-chain Fv complex 
would then be digested with the endoprotease used to cleave the introduced protease 
sensitive site. This would release the tagged peptide which can then be subjected to 

10 mass analysis / mass spectrometry sequencing. Having determined the sequences of 
tags derived from bound single-chain Fv's, corresponding synthetic oligonucleotides 
can then be produced and used to specifically amplify specific single-chain Fv genes 
from the library. These specific single-chain Fv genes can then be further used to 
generate corresponding single-chain Fv's which could then be retested for antigen 

15 binding either individually or as part of a small pool of isolated single-chain Fv's. 
Ultimately, by this method, specific single-chain Fv's can be generated with 
desirable antigen binding properties and, if from a human source, potential clinical 
utility. 

20 It will be appreciated that many of the same principles described herein for the 
digestion/cleavage, fractionation and mass analysis of proteins can also be applied to 
other polymeric molecules such as DNA or IWA. In the case of DNA or RNA, free 
phosphate and hydroxyl groups at the 5' and 3' termini respectively provide a means 
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for very specific addition of chemical tags or direct binding to a solid phase. 
Sequence specific restriction or modification enzymes provide for cleavage or 
modification of DNA molecules. Useftil affinity reagents for DNA or RNA are 
nucleic acids themselves which can be specifically hybridised to a complimentary 
5 DNA or RNA sequence with attachment to a solid phase either before of after 
hybridisation. Using such methods, complex mixtures of nucleic acids can be 
fi-actionated and then subjected to mass analysis especially using mass spectrometry. 

The invention is illustrated by the following examples which some not be 
10 considering as limiting in scope; 

Example 1 

In this example, human p53 protein was modified with a chemical tag at its N 
terminus, cleaved with a protease, the chemically tagged peptide then recovered 

15 using a tag-specific monoclonal antibody and the peptide then analysed by MALDI- 
ToF. p53 protein was a gift fi-om Dr Borek Vojisek (University of Brno, Czech 
Republic). lOOug of p53 protein with the succinimide ester of (methyl sulphonyl) 
ethyl carbonate according to Mikolajczyk et al., Bioconjugate Chem., vol 7 (1996) 
pi 50-1 58 in order to block lysine side-chains. The blocked protein was dissolved at 

20 Img/ml in O.IM sodium bicarbonate buffer pH8.5 and NHS-SS-biotin (Pierce, 
Chester, UK) was added to lOOug/ml final. The reaction was carried out for 6 hours 
at room temperature and terminated with ethanol amine. The protein mixture was 
then passed down a Sephadex G25 column (Pharmacia, Milton Keynes, UK) in PBS 
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and the void volume collected using A280 measurements of the eluates. 40ul of 
eluate containing 2ug p53 was then heat denatured (95c for 5 mins), cooled to 37c 
and lug endoproteinase Arg-C (from C. histolyticum, Calbiochem, Nottingham, UK) 
was added and the mixture incubated at 37°C for 1 hour. Then lOul of streptavidin- 

5 agarose (Sigma, Poole, UK) in PBS was added and the mixture shaken for 10 
minutes. The agarose was pelleted at 16000g for 1 min and washed three times in 
TSO buffer (75mM Tris.HCl, 200mM NaCl, 0.5% N-octyl glucoside, pH8) and 
three times in TSMK (lOmM Tris.HCl, 200mM NaCl, 5mM 2-mercaptoethanol, 
pH8). Finally, 1 Oul of a saturated solution of alpha-cyano-4-hydroxycinnamic acid 

10 in 1% aqueous trifluoroacetic acid/acetonitrile (1:1 v/v) was added to the washed 
beads and lul of this was loaded onto the mass spectrometer chip. The analysis was 
carried out using a Perseptive Biosy stems Voyager-DE STR Biospectrometry 
Workstation (Perseptive Biosystems). The mass spectra were collected by adding 
spectra from 200 laser shots. 

15 

The results showed a major peak corresponding to the 65 amino acid N terminal 
Arg-C endoprotease fragment with no significant levels of other p53 Arg-C peaks. 

Example 2 

20 The method of example 1 was repeated except that the N terminal biotin-tagged 
peptide was used to isolate a single-chain Fv antibody fragment from a phage 
display library of single-chain Fv's. Subsequently, the single-chain Fv was used to 
isolate the N-terminal peptide fragment from a protease digest of the test protein as 



confiirmed by MALDI-ToF. An extract of normal human brain, prepared as in 
example 4, was conjugated to KLH according to Harlow and Lane, "Antibodies" 
(1988) (Cold Spring Harbor Publications) and used to immunise two BalbC mice. 2 
doses were given intra-peritoneally with an interval of 4 weeks between them. 3 to 4 
days after the 2nd inoculation, the mice were sacrificed and spleens removed by 
dissection. Spleen mRNA preparation was then initiated using QuickPrep mRNA 
purification kit (Pharmacia) according to the manufacturer's instructions 

The Pharmacia Recombinant Phage Antibody System (Pharmacia) was used to 
produce a library of mouse single chain Fvs (ScFv). First-strand cDNA was 
generated from the mRNA using M-MuLV reverse transcriptase and random 
hexamer primers. Antibody heavy and light chain genes were then amplified using 
specific heavy and light chain primers complementary to conserved sequences 
flanking the antibody variable domains. The 340 and 325 base pair products 
generated for heavy and light chain DNA respectively were separately purified 
following agarose gel electrophoresis. These were then assembled into a single 
ScFv constmct using a DNA linker-primer mix to give the VH region joined by a 
(Gly4Ser)3 peptide to the VL region. The assembled ScFv were amplified with 
primers designed to insert Sfi 1 and Not 1 sites at the 5' and 3' ends respectively, 
giving an 800 bp product. This fragment was purified, sequentially digested with 
Sfil and NotI, and repurified. The fragment was then ligated into Sfil and Not! cut 
pCANTAB 5 phagemid vector. PCANTAB 5 contains the gene encoding the Phage 
Gene 3 protein (g3p) and the ScFv is inserted adjacent to the g3 signal sequence 
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such that it will be expressed as a g3p fusion protein. Competent E.coli TGI cells 
were transformed with the pCantab 5/ScFv phagemid then subsequently infected 
with the M13K07 helper phage. The resulting recombinant phage contained DNA 
encoding the ScFv genes and displayed one or more copies of recombinant antibody 
5 as fusion proteins at their tips. 

Phage-displayed ScFv that bind to the were then selected or enriched by panning. 
Briefly, the biotinylated and protease treated p53 preparation from example 1 was 
applied to a streptavidin-coated glass slide (Radius Biosciences, Waltham, USA) and 

10 the slide was washed four times in PBS. After blocking with 2% non-fat dry milk in 
PBS, the phage preparation was applied and incubated for 1 hour. After washing 10 
times with TBS/0.05% Tween 20, peptide reactive recombinant phage were detected 
with horse radish peroxidase conjugated anti-M13 antibody and revealed with o- 
phenylene diamine chromogenic substrate. These phage were subsequently eluted 

15 v^th O.IM glycine.HCl pH2.2 and Img/ml BSA and neutralised with 2M Tris base. 
The eluted phage were amplified in JM103 grown in 25ml J broth. Two additional 
rounds of parming were undertaken and finally 10 single plaques were isolated, 
pooled and further amplified. An aliquot of 10*^ amplified phage was incubated for 
2 hours at 4°C with O.lug of biotinylated and endoproteinase Axg-C digested p53 in 

20 TSO buffer. After 2 hours, 0.5ug of anti-M13 (Pharmacia) in TSO was added and 
incubated for 1 hour following which 5ul of protein AJG agarose (Sigma) was added 
and the mixture incubated for a further 0.5 hours with swirling. The agarose beads 
were then pelleted, washed as in example 1 above and analysed by mass 
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spectrometry. 

The results showed the same major peak as in example 1 corresponding to the 65 
amino acid N terminal Arg-C endoprotease fragment. 

5 

Example 3 

In this example, a gene fragment encoding a test protein was subjected to priming 
with a synthetic oligonucleotide encoding a polyhistidine tag. The cDNAs were 
expressed by in vitro transcription and translation (IVTT) and the tagged peptide 
10 fragments were then isolated using a nickel chelate column. These fragments were 
then used to isolate a single-chain Fv antibody fragment. Subsequently, the single- 
chain Fv was used to isolate a peptide fragment from a protease digest of the test 
protein as confinned by mass spectrometry. 

15 Example 4 

The method of example 2 was repeated using a total protein preparation from cells 
and the chemically tagged peptide were used to isolate a collection of single-chain 
Fv antibody fragments. Subsequently, a mixture of twelve of these single-chain Fv's 
was used to isolate peptide fragments from a protease digest of the test protein and 
20 analysed by mass spectrometry. 

Example 5 
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In this example a single-chain antibody library was produced including unique 
sequence signature tag. Human peripheral blood lymphocyte RNA was prepared 
according to standard procedures. Briefly, lymphocytes were prepared from 10ml 
heparinised blood taken from 16 normal healthy donors. Lymphocytes were 

5 collected following a density gradient centrifugation procedure using Lymphoprep 
medium (Sigma, Poole, UK). RNA was prepared using the QuickPrep system and 
instructions provided by the supplier (Pharmacia, St Albans, UK). Synthesis of 
cDNA was conducted using a cDNA synthesis kit (Pharmacia, St Albans, UK) and 
random hexamer primers with conditions recommended by the supplier. 

10 Immunoglobulin heavy chain variable region (Vh) and light chain variable regions 
(VI) were amplified from cDNA in separate PGR mixes using primer sets designed 
to maximise Vh and VI repertoires. Primer sets were as described previously (Marks 
J.D. et al 1991, Eur. J. hnmunol. 21: 985). Vh and VI PGR reactions were 
conducted using, 2.6 units of Expand High Fidelity PGR enzyme mix (Boehringer 

15 Mannheim, Lewes, UK.), Expand HF buffer (Boehringer), 1.5 mM MgCh, 200 M 
deoxynucleotide triphosphates (dNTPs) (Life Technologies, Paisley, UK) and 25 
pmoles of each primer pool. Gycles were 96°C 5 minutes, followed by [95°G 1 
minute, 50*'C 1 minute, 72°G 1 minute] times 5, [95''G 45 seconds, SO^'C 1 minute, 
72°G 1 minute 30 seconds] times 8, [95''C 45 seconds, 50°G 1 minute, 72°G 2 

20 minutes] times 5, fmishing with 72°G 5 minutes. 

In a separate PGR, a linker fragment of form (Gly4Ser)3 (Huston J.S. et al 1988, 
PNAS, 85: 5879-5883) was amplified from a cloned template pSWl-ScFvDl .3 
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(McCafferty et al, 1990, Nature 348: 522-554) using primers sets detailed previously 
(Marks, J. D in Antibody Engineering, ed Borrebaek C.A.K New York O.U.P., 
1 995). The 93 bp linker fragment product was annealed together with an equimolar 
mixture of the Vh and VI PGR products. The mixture was further amplified in a 
"pull through" reaction using flanking primers HuVHBACKsfi and HuFORNot as 
detailed in Vaughan et al (Vaughan TJ. et al 1996, Nature Biotech. 14: 309-314). 
All fragments used in the pull-through reaction were purified free of their initial 
primers prior to inclusion in the reaction. Purification was conducted using the 
Wizard PGR Preps system from Promega (Promega, Southampton UK). 

The assembled contig of form Vh-Iinker-Vl, was digested with restriction enzymes 
Sfil and NotI (Boeh ringer) using standard conditions and purified as above. The 
purified fragment was annealed with a double stranded synthetic oligonucleotide 
adapter mix designed to introduce a V8 protease cleavage site juxtaposed with a 
tract of randomised sequence in frame with the C-terminus of the VI gene. This 
V8/unique sequence tag was produced by annealing a pair of synthetic 
oligonucleotide pools of form 5'- 

ggccgcgaggaagaggaa[(atg)/(can)/(agn)/(aan)/(gan)/(ttn)]2gc-3 ' and 5 

ggccgc[(naa)/(ntc)/(ngt)/(nct)/(nag)/(cat)]2ctccttctcctcgc-3'. This linker has NotI 
compatible ends (underlined) and therefore facilitates the insertion of the complete 
single chain antibody-V8/unique sequence tag fragment into Sfil-NotI prepared 
pGANTAB 5 (Pharmacia) phagemid vector. 
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The unique sequence tag was designed to avoid the introduction of stop codons and 
further biased to exclude encoding residues with greater than two alternative codons. 
By this strategy, the number of specific oligonucleotides required to identify a given 
de-coded peptide sequence, is minimised. In all, the unique sequence tag is able to 
5 encode 1 1 of the 20 amino-acids. In addition to the V8 peptidase cleavage site (a 
string of 4 glutamic acid residues), the sequence tag is 12 codons long. Thus from 
the repertoire of 11 amino acids (10 of which are encoded by either of two codons), 
is able to encode 1 1 '72 - ~1 .5x1 0'^ different peptides. 

10 The assembled scfv fragment (Vh-linker-Vl) with Sfil and NotI prepared ends was 
annealed and ligated to the NotI sequence-tag adapter and re-purified. For 
experiments expressing the human scfv library by phage display, the complete 
fragment was ligated into Sfil-NotI prepared pCANTAB 5 (Pharmacia) phagemid 
vector, and transformed into competent TGI E.coli. 

15 

For other experiments using in vitro transcription and translation (IVTT), the 
assembled scfv library was subcloned into Sfil NotI prepared pCANTAB5-T7. This 
vector is the same as the commercially available pCANTAB5 except it was modified 
to include the T7 promoter sequence (ttaatacgactcactata) inserted at the Hindlll site 
20 at position 2235. The modification was achieved by ligation of a double-stranded 
synthetic DNA linker of sequence 5'- agctaatacgactcactata into Hindlll cut and de- 
phosphorylated pCANTAJB5. Recombinant clones containing the T7 promoter were 
selected using a diagnostic PGR. 
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Following ligation and transformation into competent TGI E.coli, cells were grown 
for 1 hour in Iml of SOC medium and then plated onto TYE medium with lOOug/ml 
ampicillin. Colonies were scraped off plates into 5ml of 2x TY broth containing 
5 ampicillin. The cultured library was used to prepare DNA for IVTT reactions. 

The pCANTAB5-T7 Scfv library DNA was used in an in vitro translation reaction. 
The IVTT was conducted using the T7 Quick coupled transcription translation mix 
(Promega, Southampton, UK) and lOg of the pCANTAB5-T7 Scfv library DNA in a 
10 total volume of 501. The translation reaction was conducted at 3(fC for 90 minutes 
then placed on ice. In some experiments reactions were monitored for the presence 
of translation products using "S-methionine incorporation assays. Reactions were 
stored at — 70°C prior to use in binding and screening assays. 

15 The single-chain antibody library was used to in a binding reaction to recombinant 
human p53 protein (Oncogene Research Products-Calbiochem, Nottingham, UK). 
The IVTT mix was diluted xlO fold in PBS and used in a binding assay to human 
recombinant p53 protein immobilised in a 96- well microplate. The p53 protein was 
immobilised by overnight incubation at a concentration of lOOg/ml in phosphate 

20 buffer at 4"C. The plate was washed using PBS 0.5% (w/v) BSA and the diluted 
IVTT mix added to the test and control wells for binding. The binding reaction was 
conducted at 2TC for 90 minutes. The plate was washed x3 using PBS-T (PBS + 
0.05% v/v tween-20) and subjected to V8 protease digestion (Takara, Wokingham, 
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UK). Protein fragxnents were collected from the supernatant and size fractionated to 
exclude the V8 protease and other large species before analysis by MALDI-tof. 

MALDI-tof fragment analysis identified a number of peptide fragments. The 
5 peptide sequences were used to design a set of corresponding synthetic 
oligonucleotides. The oligonucleotides were used in a PCR based screen of the 
single chain library. Pfu turbo (Stratagene Europe) DNA polymerase was used to 
synthesise complementary strands in members of the human single-chain antibody 
library DNA. Following 15 rounds of thermal cycling, the product was subjected to 
10 Dpnl digestion. This step depleted the mixture of parental plasmid molecules to 
ensure that only the newly synthesised primed products were propagated, lp.1 of the 
reaction was transformed into TGI competent cells and plated onto LB plates 
containing lOOjig/ml ampicillin. Individual clones were picked, expanded and DNA 
prepared according to standard procedures. The DNA was used directly in a second 
15 round of screening involving IVTT, antigen binding, V8 protease digestion, 
MALDI-tof fragment analysis. After 2 rounds of selection 6 scFv's were isolated 
which bound recombinant p53. 

20 
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